首页> 外文OA文献 >Multi-Agent Credit Assignment in Stochastic Resource Management Games
【2h】

Multi-Agent Credit Assignment in Stochastic Resource Management Games

机译:随机资源管理游戏中的多智能体信用分配

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Multi-Agent Systems (MAS) are a form of distributed intelligence, where multiple autonomous agents act in a common environment. Numerous complex, real world systems have been successfully optimised using Multi-Agent Reinforcement Learning (MARL) in conjunction with the MAS framework. In MARL agents learn by maximising a scalar reward signal from the environment, and thus the design of the reward function directly affects the policies learned. In this work, we address the issue of appropriate multi-agent credit assignment in stochastic resource management games. We propose two new Stochastic Games to serve as testbeds for MARL research into resource management problems: the Tragic Commons Domain and the Shepherd Problem Domain. Our empirical work evaluates the performance of two commonly used reward shaping techniques: Potential-Based Reward Shaping and difference rewards. Experimental results demonstrate that systems using appropriate reward shaping techniques for multi-agent credit assignment can achieve near optimal performance in stochastic resource management games, outperforming systems learning using unshaped local or global evaluations. We also present the first empirical investigations into the effect of expressing the same heuristic knowledge in state- or action-based formats, therefore developing insights into the design of multi-agent potential functions that will inform future work.
机译:多代理系统(MAS)是一种分布式智能,其中多个自治代理在一个公共环境中起作用。使用Multi-Agent强化学习(MARL)结合MAS框架已成功优化了许多复杂的现实世界系统。在MARL中,代理通过最大化来自环境的标量奖励信号来学习,因此奖励功能的设计直接影响所学习的策略。在这项工作中,我们解决了随机资源管理游戏中适当的多主体信用分配问题。我们提出了两个新的随机游戏,作为MARL研究资源管理问题的试验平台:悲惨的公共领域和牧羊人问题领域。我们的经验工作评估了两种常用的奖励塑造技术的性能:基于潜力的奖励塑造和差异奖励。实验结果表明,使用适当的奖励整形技术进行多主体信用分配的系统在随机资源管理游戏中可以达到近乎最佳的性能,其性能优于使用不整形的局部或全局评估进行学习的系统。我们还针对以状态或基于动作的格式表达相同的启发式知识的效果进行了首次实证研究,从而对多代理潜在功能的设计形成了深刻见解,这些功能将为将来的工作提供参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号